Load the necessary libraries
library(mgcv) #for GAMs
library(gratia) #for GAM plots
library(emmeans) #for marginal means etc
library(broom) #for tidy output
library(MuMIn) #for model selection and AICc
library(lubridate) #for processing dates
library(mapdata)
library(maps)
library(tidyverse) #for data wrangling
library(DHARMa) #for residual diagnostics
library(performance)
library(see)
library(sf)
library(stars)
library(rnaturalearth)
library(rnaturalearthdata)
library(raster)
library(ggspatial)
library(patchwork)
Paruelo and Lauenroth (1996) analyzed the geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFT’s) including shrubs, forbs, succulents (e.g. cacti), C3 grasses and C4 grasses. They used data from 73 sites across temperate central North America (see pareulo.csv) and calculated the relative abundance of C3 grasses at each site as a response variable
grass
Format of paruelo.csv data file
| C3 | LAT | LONG | MAP | MAT | JJAMAP | DJFMAP |
|---|---|---|---|---|---|---|
| ... | ... | ... | ... | ... | ... | ... |
| C3 | - Relative abundance of C3 grasses at each site - response variable |
| LAT | - Latitudinal coordinate |
| LONG | - Longitudinal coordinate |
| MAP | - Mean annual precipitation |
| MAT | - Mean annual temperature |
| JJAMAP | - Mean annual precipitation in June, July, August |
| DJFMAP | - Mean annual precipitation in December, January, February |
paruelo = read_csv('../public/data/paruelo.csv', trim_ws=TRUE)
glimpse(paruelo)
## Rows: 73
## Columns: 7
## $ C3 <dbl> 0.65, 0.65, 0.76, 0.75, 0.33, 0.03, 0.00, 0.02, 0.05, 0.05, 0.…
## $ LAT <dbl> 46.40, 47.32, 45.78, 43.95, 46.90, 38.87, 32.62, 36.95, 35.30,…
## $ LONG <dbl> 119.55, 114.27, 110.78, 101.87, 102.82, 99.38, 106.75, 96.55, …
## $ MAP <dbl> 199, 469, 536, 476, 484, 623, 259, 969, 542, 421, 446, 376, 66…
## $ MAT <dbl> 12.4, 7.5, 7.2, 8.2, 4.8, 12.0, 14.5, 15.3, 13.9, 8.5, 5.1, 11…
## $ JJAMAP <dbl> 0.12, 0.24, 0.24, 0.35, 0.40, 0.40, 0.47, 0.30, 0.44, 0.31, 0.…
## $ DJFMAP <dbl> 0.45, 0.29, 0.20, 0.15, 0.14, 0.11, 0.17, 0.14, 0.13, 0.14, 0.…
We will focus on the spatial components.
Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i =\beta_0 + f(Long_i) + f(Lat_i) + f(Long_i, Lat_i) \]
where \(\beta_0\) is the y-intercept. \(f(Lat)\) and \(f(Long)\) indicate the additive smoothing functions of the spatial predictors.
For this example, we are going to use GAM for a spatial analysis. Therefore, in addition to the usual assumptions, we are going to want to include exploratory data analyses that represent the spatial nature of the data. One way to do this is via a bubble plot. Such a plot represents the response as circles whose relative circumfernce is proportional to the magnitude of the response value and whose x and y coordinates reflect their position in space (e.g. longitude and latitude).
In order to properly represent the spatial configuration of sampling units along with the broader spatial context, it is useful to overlay the points over a map of the landscale. The https://www.naturalearthdata.com/ is a public domain map database and the rnaturalearth package provides a convenient R interface to this database.
usa <- ne_countries(country=c('united states of america','canada'),
scale = "medium", returnclass = "sf")
ggplot() +
geom_sf(data=usa)
This includes all states and territories for both Canada and USA. Whilst this is politically correct, it is not ecologically sensible. So lets focus on just the mainlnd of USA.
usa = usa %>%
st_crop(xmin=-130, xmax=-60,
ymin=0, ymax=60)
The beta distribution has no mass at 0 and 1. Therefore if there are any 0 or 1 values (corresponding to cover of 0 or 100 percent), they need to be shrunk in a little to allow the beta to be used.
The Longitude values supplied in the dataset are in westerly units. To allow them to relate to a world map, we will need to multiply them by -1.
paruelo = paruelo %>%
mutate(mC3=ifelse(C3==0, 0.01,C3),
LONG=-1*LONG)
paruelo.sf = paruelo %>%
st_as_sf(coords=c('LONG', 'LAT'), crs=st_crs(usa))
Now we can generate the bubble plot.
ggplot() +
geom_sf(data=usa) +
geom_sf(data=paruelo.sf, aes(color=C3, size=C3)) +
scale_color_gradientn(colors=heat.colors(n=10)) +
coord_sf(expand=FALSE)
ggplot(paruelo) + geom_histogram(aes(x=C3)) +
ggplot(paruelo) + geom_histogram(aes(x=C3)) + scale_x_continuous(trans=scales::logit_trans())
Conclusions:
There are numerous options for fitting two-dimensional splines in GAMs:
s() and including two predictors. This approach:
te() and including the two predictors. This approach:
ti() with each main effect and their interaction (as separate specifications). This approach:
paruelo.gam1 = gam(mC3 ~ s(LONG,LAT), data=paruelo,
family=betar, method='REML')
k.check(paruelo.gam1)
## k' edf k-index p-value
## s(LONG,LAT) 29 4.848145 0.898749 0.085
appraise(paruelo.gam1)
Conclusions:
Unfortunately, DHARMa residuals do not directly support the beta family. However, we can still make use of its useful diagnostics if we do some of the work (generate the simulations and provide the response and predictions) ourselves.
paruelo.resids <- createDHARMa(simulatedResponse = simulate(paruelo.gam1, nsim=250),
observedResponse = paruelo$mC3,
fittedPredictedResponse = predict(paruelo.gam1))
plot(paruelo.resids)
Conclusions:
paruelo.gam2 = gam(mC3 ~ te(LONG,LAT), data=paruelo, family=betar, method='REML')
k.check(paruelo.gam2)
## k' edf k-index p-value
## te(LONG,LAT) 24 3.00004 0.7814077 0.0075
appraise(paruelo.gam2)
Conclusions:
Unfortunately, DHARMa residuals do not directly support the beta family. However, we can still make use of its useful diagnostics if we do some of the work (generate the simulations and provide the response and predictions) ourselves.
paruelo.resids <- createDHARMa(simulatedResponse = simulate(paruelo.gam2, nsim=250),
observedResponse = paruelo$mC3,
fittedPredictedResponse = predict(paruelo.gam1))
plot(paruelo.resids)
Conclusions:
paruelo.gam3 = gam(mC3 ~ ti(LONG) + ti(LAT) + ti(LONG,LAT),
data=paruelo, family=betar, method='REML')
k.check(paruelo.gam3)
## k' edf k-index p-value
## ti(LONG) 4 1.000019 0.8673167 0.1250
## ti(LAT) 4 1.000024 0.9588562 0.3475
## ti(LONG,LAT) 16 3.523281 0.9956490 0.4025
appraise(paruelo.gam3)
Conclusions:
Unfortunately, DHARMa residuals do not directly support the beta family. However, we can still make use of its useful diagnostics if we do some of the work (generate the simulations and provide the response and predictions) ourselves.
paruelo.resids <- createDHARMa(simulatedResponse = simulate(paruelo.gam3, nsim=250),
observedResponse = paruelo$mC3,
fittedPredictedResponse = predict(paruelo.gam1))
plot(paruelo.resids)
Conclusions:
plot(paruelo.gam1,scheme = 2)
vis.gam(paruelo.gam1, theta=30)
draw(paruelo.gam1)
plot(paruelo.gam2,scheme = 2)
vis.gam(paruelo.gam2, theta=30)
draw(paruelo.gam2)
plot(paruelo.gam3,scheme = 2)
vis.gam(paruelo.gam3, theta=30)
draw(paruelo.gam3)
summary(paruelo.gam1)
##
## Family: Beta regression(4.304)
## Link function: logit
##
## Formula:
## mC3 ~ s(LONG, LAT)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1021 0.1081 -10.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(LONG,LAT) 4.848 6.815 59.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.455 Deviance explained = 56.3%
## -REML = -45.112 Scale est. = 1 n = 73
Conclusions:
tidy(paruelo.gam1)
summary(paruelo.gam2)
##
## Family: Beta regression(4.366)
## Link function: logit
##
## Formula:
## mC3 ~ te(LONG, LAT)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1120 0.1077 -10.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## te(LONG,LAT) 3 3 62.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.466 Deviance explained = 55.4%
## -REML = -52.794 Scale est. = 1 n = 73
Conclusions:
tidy(paruelo.gam2)
summary(paruelo.gam3)
##
## Family: Beta regression(5.099)
## Link function: logit
##
## Formula:
## mC3 ~ ti(LONG) + ti(LAT) + ti(LONG, LAT)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2330 0.1151 -10.71 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## ti(LONG) 1.000 1.000 0.378 0.53843
## ti(LAT) 1.000 1.000 73.604 < 2e-16 ***
## ti(LONG,LAT) 3.523 3.875 15.070 0.00298 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.535 Deviance explained = 63.6%
## -REML = -52.2 Scale est. = 1 n = 73
Conclusions:
tidy(paruelo.gam3)
paruelo.list = with(paruelo,
list(LAT=seq(min(LAT), max(LAT), len=100),
LONG=seq(min(LONG), max(LONG), len=100)))
newdata = emmeans(paruelo.gam1, ~LONG+LAT, at=paruelo.list, type='response') %>%
as.data.frame
newdata %>% head
ggplot(newdata, aes(y=LAT, x=LONG)) +
geom_tile(aes(fill=response)) +
geom_contour(aes(z=response)) +
scale_fill_gradientn(colors=heat.colors(10)) +
geom_point(data=paruelo,aes(fill=C3), shape=21, size=5) +
coord_equal()
newdata.sf <- rasterFromXYZ(newdata[, c("LONG", "LAT", "response")]) %>%
st_as_stars() %>%
st_set_crs(st_crs(usa))
## OR
#newdata.sf <- newdata %>%
# st_as_sf(coords=c("LONG", "LAT"), crs=st_crs(usa)) %>%
# st_rasterize()
ggplot() +
geom_sf(data=usa) +
geom_stars(data=newdata.sf)
newdata.sf <- rasterFromXYZ(newdata[, c("LONG", "LAT", "response")]) %>%
mask(usa) %>%
st_as_stars() %>%
st_set_crs(st_crs(usa))
ggplot() +
geom_sf(data=usa) +
geom_stars(data=newdata.sf) +
scale_fill_gradientn(colours=heat.colors(10), na.value=NA) +
geom_sf(data=paruelo.sf, aes(fill=C3), shape=21, size=4) +
annotation_scale(location = "bl", width_hint = 0.25) +
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.25, "in"), pad_y = unit(0.2, "in"),
style = north_arrow_fancy_orienteering) +
coord_sf(expand=FALSE) +
theme_bw()
paruelo.list = with(paruelo,
list(LAT=seq(min(LAT), max(LAT), len=100),
LONG=seq(min(LONG), max(LONG), len=100)))
newdata = emmeans(paruelo.gam2, ~LONG+LAT, at=paruelo.list, type='response') %>%
as.data.frame
newdata %>% head
ggplot(newdata, aes(y=LAT, x=LONG)) +
geom_tile(aes(fill=response)) +
geom_contour(aes(z=response)) +
scale_fill_gradientn(colors=heat.colors(10)) +
geom_point(data=paruelo,aes(fill=C3), shape=21, size=5) +
coord_equal()
newdata.sf <- rasterFromXYZ(newdata[, c("LONG", "LAT", "response")]) %>%
st_as_stars() %>%
st_set_crs(st_crs(usa))
## OR
#newdata.sf <- newdata %>%
# st_as_sf(coords=c("LONG", "LAT"), crs=st_crs(usa)) %>%
# st_rasterize()
ggplot() +
geom_sf(data=usa) +
geom_stars(data=newdata.sf)
newdata.sf <- rasterFromXYZ(newdata[, c("LONG", "LAT", "response")]) %>%
mask(usa) %>%
st_as_stars() %>%
st_set_crs(st_crs(usa))
ggplot() +
geom_sf(data=usa) +
geom_stars(data=newdata.sf) +
scale_fill_gradientn(colours=heat.colors(10), na.value=NA) +
geom_sf(data=paruelo.sf, aes(fill=C3), shape=21, size=4) +
annotation_scale(location = "bl", width_hint = 0.25) +
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.25, "in"), pad_y = unit(0.2, "in"),
style = north_arrow_fancy_orienteering) +
coord_sf(expand=FALSE) +
theme_bw()
paruelo.list = with(paruelo,
list(LAT=seq(min(LAT), max(LAT), len=100),
LONG=seq(min(LONG), max(LONG), len=100)))
newdata = emmeans(paruelo.gam3, ~LONG+LAT, at=paruelo.list, type='response') %>%
as.data.frame
newdata %>% head
ggplot(newdata, aes(y=LAT, x=LONG)) +
geom_tile(aes(fill=response)) +
geom_contour(aes(z=response)) +
scale_fill_gradientn(colors=heat.colors(10)) +
geom_point(data=paruelo,aes(fill=C3), shape=21, size=5) +
coord_equal()
newdata.sf <- rasterFromXYZ(newdata[, c("LONG", "LAT", "response")]) %>%
st_as_stars() %>%
st_set_crs(st_crs(usa))
## OR
#newdata.sf <- newdata %>%
# st_as_sf(coords=c("LONG", "LAT"), crs=st_crs(usa)) %>%
# st_rasterize()
ggplot() +
geom_sf(data=usa) +
geom_stars(data=newdata.sf)
newdata.sf <- rasterFromXYZ(newdata[, c("LONG", "LAT", "response")]) %>%
mask(usa) %>%
st_as_stars() %>%
st_set_crs(st_crs(usa))
ggplot() +
geom_sf(data=usa) +
geom_stars(data=newdata.sf) +
scale_fill_gradientn(colours=heat.colors(10), na.value=NA) +
geom_sf(data=paruelo.sf, aes(fill=C3), shape=21, size=4) +
annotation_scale(location = "bl", width_hint = 0.25) +
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.25, "in"), pad_y = unit(0.2, "in"),
style = north_arrow_fancy_orienteering) +
coord_sf(expand=FALSE) +
theme_bw()
Paruelo, J. M., and W. K. Lauenroth. 1996. “Relative Abundance of Plant Functional Types in Grasslands and Shrublands of North America.” Ecologicalapplications, no. 6: 1212–24.